-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25061][SQL] Precedence for ThriftServer hiveconf commandline parameter #27041
Conversation
I think @yhuai wrote the comment above about why it's processed in that order, so may be able to review better |
gentle ping @yhuai @dongjoon-hyun @HyukjinKwon |
ok to test |
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ajithme .
In general, we need one of the followings.
- A unit test case.
- An explicit reproducible procedure.
In the section Tested this patch manually
of the PR description, could you elaborate your procedure to verify this PR? Please write what you did with the command lines and show the result of before and after this PR.
Test build #116541 has finished for PR 27041 at commit
|
Could you check the UT failure, @ajithme ? |
Sure, will update shortly |
Found the test case failures due to Please retest. Will update the PR description with manual steps for verification shortly |
Test build #116656 has finished for PR 27041 at commit
|
@dongjoon-hyun I have updated the PR with testcase failure correction also added a UT to reproduce and verify the issue. Please review |
Test build #116713 has finished for PR 27041 at commit
|
val confMap = (hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue) ++ | ||
sparkConf.getAll.toMap ++ extraConfig).toMap | ||
sparkConf.getAll.toMap ++ overriddenHiveProps ++ extraConfig).toMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we should update https://github.com/apache/spark/pull/27041/files#diff-6fd847124f8eae45ba2de1cf7d6296feR170-R179 and also explain why extraConfig is at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yhuai Sure, I have updated the PR with reasonable pointers for the order. Does it suffice it now.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. As getConfSystemProperties will get all of hive confs that are in the system properties, it is possible that we will pull in a config that is not set by --hiveconf
. Seems we are introducing a behavior change? Can you explain the impact of this change and why this change is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, even without my changes in this PR, the HiveConf
always considers the hive confs in system properties which were not set via --hiveconf
as part of HiveConf
constructor i.e Refer https://github.com/apache/hive/blob/rel/release-2.3.5/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4079 (same behaviour for hive 1.2.1 as well) hence, this do not change any flow
Test build #116725 has finished for PR 27041 at commit
|
gentle ping @dongjoon-hyun @yhuai @cloud-fan @HyukjinKwon can we get this fix in 3.0.? |
|
||
// not to lose command line overwritten properties | ||
// make a copy overridden props so that it can be reinserted finally | ||
val overriddenHiveProps = HiveConf.getConfSystemProperties.asScala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we totally ignore the --hive-conf previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was handled as --hiveconf as part of HiveConf constructor i.e Refer https://github.com/apache/hive/blob/rel/release-2.3.5/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L4079 ,
i.e first it loads hive-site and then it adds --hiveconf properties on top.
But in spark we again add hadoopConf on top of it hence overwriting HiveConf order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more code comment to convince people that HiveConf.getConfSystemProperties
contains only the --hiveconf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan updated. is the comment adequate now?
Test build #118644 has finished for PR 27041 at commit
|
Thanks for pinging me @cloud-fan We can override hive configurations in many ways. Take This PR seems to prefer Personally, I prefer those spark configurations always have higher precedence than other type |
Thanks @yaooqinn for your thoughts. This seems little confusing to know who is overriding as per the documentation mentioned in https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html#running-the-thrift-jdbcodbc-server I agree with your opinion of having sparkConf as most precedence, but command line (--hiveconf) should be preferred over config file (hive-site.xml) For the case you mentioned (X marks the conf is used)
so do you mean, in case 1, 2, 3 where spark.* conf is used it must get preference.? i prefer in case 4 --hiveconf has precedence and rest cases spark conf can have higher precedence |
I have no stong option about case 4. BTW, you should also pay attention to |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
In
HiveClientImpl.scala
when we create new SessionState, we prepare configuration and the options passed tostart-thriftserver.sh
via--hiveConf
need to take precedence when creatingHiveConf
i.e in orderhadoopConf < sparkConf < overrideProps(--hiveconf) < extraConfig
Why are the changes needed?
As per the documentation here, https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html user can provide
--hiveconf
to override the hive configurations when usingstart-thriftserver.sh
but as per the code, https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L182 here, the hive-site properties (part of hadoopConf) will override the configuration done from command line which is not as per expectationDoes this PR introduce any user-facing change?
No
How was this patch tested?
Added UT along with Base UTs can Pass